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One of the challenges in fighting cybercrime is to understand the dynamics of message propagation 
on botnets, networks of infected computers used to send viruses, unsolicited commercial emails 
(SPAM) or denial of service attacks. We map this problem to the propagation of multiple random 
walkers on directed networks and we evaluate the inter-arrival time distribution between successive 
walkers arriving at a target. We show that the temporal organization of this process, which models 
information propagation on unstructured peer to peer networks, has the same features as SPAM 
arriving to a single user. We study the behavior of the message inter-arrival time distribution on 
three different network topologies using two different rules for sending messages. In all networks the 
propagation is not a pure Poisson process. It shows universal features on Poissonian networks and a 
more complex behavior on scale free networks. Results open the possibility to indirectly learn about 
the process of sending messages on networks with unknown topologies, by studying inter-arrival 
times at any node of the network. 

PACS numbers: 05.40.Fb, 05.40.-a, 64.60.aq, 89.75.Fb 



I. INTRODUCTION 

A botnet is a network of infected computers which are 
under command and control of a single person, the "bot- 
master" . Botnets are abstract overlay networks on top of 
the physical network topology. They are used for sending 
unsolicited commercial emails (SPAM), viruses, denial of 
service attacks, for stealing identity data, and for other 
sorts of cyber crime. Botnets are the primary security 
threat on Internet today [TJ . They grew up to be a global 
and multi million dollar business. Fighting botnets is a 
hard task since their structure is constantly evolving, and 
their inner working is not known. Understanding the dy- 
namics of communication on such networks is a big chal- 
lenge and could be crucial for finding effective tools to 
fighting botnets. This problem can be successfully ap- 
proached from the network theory perspective. 

Botnets can have different structures. The new gener- 
ations of botnets, which are more robust against attacks 
and very difficult to track, are based on peer to peer com- 
munication Q . This type of communication can be mod- 
eled as a random walk. When the botmaster sends an 
order to its bots, it sends it only to a fraction of nodes in 
the botnet, which can then forward it to only those bots 
whose IP addresses they know. These addresses are ran- 
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domly assigned to bots. There is no centralized point in 
such networks since all the nodes are equally important, 
they are all clients and servers at the same time. This is 
the reason for botnet's robustness against attacks. Even 
if a node of the botnet is identified, its communication 
with the rest of the botnet can be tracked back only to a 
limited number of bots. Therefore, no attack would de- 
stroy the whole botnet, or lead to the botmaster. Links 
in peer to peer networks do not have to be bidirectional, 
and the unstructured peer to peer networks, often used 
for constructing botnets since their are the most diffi- 
cult to track, have random topology. Therefore we will 
model botnets as random directed networks, and we will 
study the propagaton of random walkers on them in or- 
der to attack the problem of understanding the internal 
mechanisms of message propagation on these networks. 
Random walks on directed networks are also an interest- 
ing fundamental problem, important for understanding 
the communication in any other peer to peer network 
[H, wireless sensor networks ad hoc networks or 
different processes on world wide web, such as tagging 
[6|. The results we present are general, not botnet spe- 
cific, and are valid for any system in which data packets 
propagate in a random fashion on the directed network. 

Random walks and related stochastic processes have 
mainly been studied on regular lattices and d-dimensional 
euclidian spaces in the past Q , due to their obvious rele- 
vance to physical problems. In recent years networks are 
becoming the preferred model to study complex systems 
H Q and this triggered studies of random walks on them 
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10-24]. However, most of the previous results concern 
random walks on undirected networks. Random walks on 
directed networks have been mainly investigated to find 
communities in citation networks |25], identify subgraph 
structures on World Wide Web [26j , or in calculations of 
the PageRank [13, HH. This is a measure used by the 
search engine Google (as well as by several other search 
engines) to determine the prestige of Web pages. When 
a user submits a query, the hits returned by Google are 
ranked according to the value of their PageRank. The 
algorithm determining this value is based on a modified 
random walk on the web graph, were nodes are web pages 
and edges between them are naturally directed hyperlinks 
connecting web pages. In each step the modified random 
walker either follows a randomly chosen outgoing link of 
the present node, or with a small probability, called the 
damping factor, it jumps to a randomly chosen node in 
the network. 

We study random walks on directed networks to model 
peer to peer communication on botnets as spreading of 
messages. We are interested in the temporal organization 
of this dynamic process and therefore we investigate the 
distribution of inter-arrival times between two successive 
messages arriving to a given receiver. The inter-arrival 
time distribution has been first introduced to characterize 
the temporal occurrence of earthquakes [29| . It has been 
then studied in the contexts of different stochastic pro- 
cesses, as solar flares [30l - l33j . forest fires 34 1 or in package 
transport in computer science [35l [36[ . The interesting 
property of this quantity is that it is able to provide in- 
formation about the temporal organization of processes 
whose detailed mechanisms are unknown. In particular, 
it is a simple exponential if the process is Poissonian and 
therefore it is able to enlighten the presence of tempo- 
ral correlations among events, when a non-exponential 
behavior is detected. 

A recent paper [3?]], has investigated the statistical 
properties of the SPAM delivery inter-arrival times. Re- 
sults have suggested that SPAM messages delivered to 
a given recipient are time correlated: if the inter-arrival 
time between two consecutive SPAM messages is small 
(large) , then the next SPAM message will most probably 
arrive after a small (large) inter-arrival time. SPAM tem- 
poral correlations have been reproduced by a numerical 
model based on the random superposition of SPAM se- 
quences, each one described by the Omori law [38|- This 
and other experimental findings [37| suggest that sta- 
tistical approaches may be used to infer how spammers 
operate. 

Our motivation to study the distribution of message 
inter-arrival times on model networks is to detect the 
eventual presence of temporal correlations and their re- 
lation with the network topology. The inter-arrival time 
distribution of messages sent only to one or a small frac- 
tion of nodes could then provide information about the 
dynamical process taking place on a real network. In the 
case of botnets this would imply that we would be able 
to get information about their organization and structure 



by studying inter-arrival times of either SPAM emails or 
of contaminated packages, both sent by a botnet, by an- 
alyzing data even of a single user. These data are easily 
accessible, cheap and easy to monitor. Since botnets are 
difficult to identify, this indirect way of learning about 
their organization would give a boost in fighting botnets 
and cyber crime in general. 

The paper is organized as follows: In the next sec- 
tion we will describe the model and the implemented net- 
works. In the third section we discuss results obtained 
for random networks and show their comparison with the 
real data. In the following two sections we extend our in- 
vestigation to networks with random topologies without 
dead ends and to scale free networks. Finally in the last 
section we discuss the results and give some concluding 
remarks. 



II. MODEL 

We start by constructing a randomly connected di- 
rected network with a given degree distribution of inputs 
and outputs. We implement the Poisson distribution, the 
Poisson without dead ends and the power law distribu- 
tions. We choose a random node in the network to be 
the one where we measure the inter-arrival times between 
two successive random walker arrivals, dt = U + ± — £j. We 
call this node the target node or the receiver. The target 
node is chosen at random among nodes with a given num- 
ber of inputs. The number of outputs of the target is not 
fixed since it does not influence the number of message 
arrivals. Next, we choose a node from which messages 
depart, the sender. This node is chosen at random, with 
a fixed number of outputs. Since only the number of 
outputs determines the number of different ways a mes- 
sage can leave on its way to the receiver, we do not fix 
the number of inputs of a sender. In our process the 
sender is the botmaster sending orders to its bots, and 
what we measure at the receiver can be compared to the 
arrivals of messages to a generic user. We have found 
that increasing the number of senders does not affect the 
basic properties of the inter-arrival time distribution of 
messages reaching the receiver. 

After a message departs from the sender, it follows at 
each time step one randomly chosen outgoing link of the 
occupied node. The walk continues until either the target 
node is reached, or an initially fixed maximal number of 
steps is exceeded. It is necessary to introduce a limit 
on the number of steps since, due to the fact that the 
links are directed, the network may have regions that 
the walker can enter but cannot escape from. This limit 
also exists in real internet protocols where it is called 
the time to live (TTL). This is a limit on the number of 
transmissions that a data package can experience before 
it should be discarded. In all simulations presented here 
we fix this value to twice the size of the network. 

To simulate the dynamical process typical for a bot- 
master, who sends a large number of messages through 
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botnets, many random walkers depart from the sender. 
We send them either one by one or all at once. The walk- 
ers are independent. We record the times at which mes- 
sages arrive to the receiver and calculate the inter-arrival 
times between successive messages. When the messages 
are sent all at once, their arrival times depend solely on 
the length of the paths undertaken on their way to the 
receiver. This is mainly affected by the network's topol- 
ogy. When the messages are sent one after another, each 
consecutive message has an equal time delay in starting 
its walk to the target node. This process is more complex 
than the previous one since a message can arrive to the 
target before others sent earlier due to a shorter under- 
taken path. In the beginning of the process the number 
of messages arriving to the target increases with time, 
until a stationary state is reached. We are interested in 
the stationary state of the process, where we measure 
the distribution of inter-arrival times. In our model at 
each time step a new walker departs from the sender. We 
have, however, verified that the introduction of a longer 
time delay between successive departures does not affect 
the main properties of the distribution. 



III. RANDOM NETWORKS 

All the networks randomly connect input and output 
links assigned to the nodes according to some distribution 
[39l S3|. We call, however, the networks random only 
when the distribution is Poissonian, p(k) = ((k) k /k\) ■ 
eW , where k is the number of links and (k) its mean value 
[4ll |. When we construct random networks we choose 
both, the distribution of input and of output links, to be 
Poissonian with the same mean degree. The number of 
walkers has to be large enough to provide good statistics 
for the distribution of inter-arrival times. 

We sample data from 500 different network realiza- 
tions for a given degree distribution and fixed values of 
the sender outdegree and the receiver indegree. We fix 
both these values equal to 4. We have, however, verified 
that the specific value of these parameters does not affect 
the behavior of the distribution. The space of possible 
topologies and possible choices of a sender- receiver couple 
is extremely large. To get better statistics we also fix the 
distance between these two nodes, namely the shortest 
path between them. 

We first study the case when messages are sent one by 
one. We find that the distributions of inter-arrival times 
for networks with Poisson distributed links and different 
average degree (k) exhibit similar behavior (FigfTJ). Only 
for networks with a small value of the average degree, and 
therefore a lower level of connectivity, longer inter-arrival 
times are measured. The average inter-arrival time in- 
deed increases exponentially as the average connectivity, 
(k), decreases (inset Figj2]). In order to check if the dis- 
tribution is a universal function, solely controlled by the 
average rate R of walkers arri ving at the target, we verify 
the following scaling relation |42| 
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FIG. 1: Distribution of inter-arrival times in networks with 
Poisson degree distribution and different values of the mean 
degree. The networks have N = 10 4 nodes, the number of sent 
messages is M = 10 6 and the distance between the sender and 
the receiver is fixed to 8. The messages are sent one by one. 
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FIG. 2: Distributions of inter-arrival times in networks with 
Poisson degree distribution and different values of the mean 
degree, rescaled by the average rate of message arrivals. The 
networks have N — 10 4 nodes, the number of sent messages is 
M — 10 6 and the distance between the sender and the receiver 
is fixed equal to 8. The messages are sent one by one. The 
inset shows the dependence of the mean inter-arrival time on 
the average degree of the network. 



P(dt) = Rf(Rdt), 

Therefore we evaluate the average rate for each dis- 
tribution, as the inverse of the mean inter-arrival time 
R = 1/ < dt >, and rescale the inter- arrival time by 
the average rate. We find that the different distributions 
collapse quite well onto a universal curve (Fig. refpoiss- 
DiffKscaled) , if the total number of links in the network 
is large enough. Small deviations are observed only for 
large dt. If the network is too sparse, i.e. if (k) < 3, the 
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FIG. 3: Distributions of inter-arrival times in networks with 
Poisson degree distribution with mean degree equal to 4 and 
different distances between the sender and the receiver. The 
distributions are rescaled by the average rate of message ar- 
rival. The networks have TV = 10 4 nodes, the number of sent 
messages is M = 10 6 . The messages are sent one by one. The 
inset shows the dependence of the mean inter-arrival time on 
the distance between the sender and the receiver. 




FIG. 4: Rescaled distributions of inter-arrival times of SPAM 
emails sent from two different domains and sampled in three 
different junk mailboxes (empty symbols) from reference [37l ]. 
and of our model (full symbols) with messages sequentially 
sent through random networks with different (k) values and 
the distance between the sender and the receiver fixed to 8. 
The networks have N — 10 4 nodes, the number of sent mes- 
sages is M = 10 6 . 



mean inter-arrival time increases and the probability of 
longer dt becomes larger. This effect is caused by the fact 
that in sparse networks only a small number of messages 
reaches the target. The space of possible paths leading to 
it is not fully explored and the trapping regions in sparse 
networks are more prominent. 

The universal scaling function in Fig J2] exhibits an ini- 
tial almost constant regime followed by an exponential 
like decay. If the messages would arrive to the target 
independently of each other, inter-arrival times would 
be distributed exponentially, as it happens in Poissonian 
processes. In our process the distribution of inter-arrival 
times deviates from the exponential, which indicates that 
the process is more complex, and possibly correlations 
are present, coming from the networks topology or the 
message sending process itself. We have also studied the 
influence of the distance d between the sender and the 
receiver on the distribution of inter-arrival times. If the 
nodes are not too far away (for d < 8), the rescaling of 
the distributions by the average rate of message arrivals 
provides a good collapse (Figj3]) with fluctuations at large 
intertimes only for d = 8. The inset shows that the mean 
inter-arrival time grows linearly with the sender - receiver 
distance. 

The present model simulates propagation of messages 
in unstructured peer to peer botnets where the botmas- 
ter is sequentially sending a large number of messages 
to the bots. The dynamics of sending orders inside the 
botnets influences the dynamics of arrival of messages 
sent from bots to the final destination, which could be 
a computer of a simple user receiving SPAM. Therefore, 
the fingerprint of the dynamics of message propagation 
inside the botnets should be visible in the distribution of 



inter-arrival times of SPAM emails collected in the mail- 
box of a single user. We compare our results with the dis- 
tribution of inter-arrival times for SPAM data, presented 
in Ref. (37[. The experimental data in Fig 0] are sam- 
pled from three different junk mailboxes, and the spam 
emails are selected on the basis of their geographical ori- 
gin, Europe and the United States. Surprisingly, already 
our simple model is able to reproduce quite well the ba- 
sic characteristics of the inter-arrival time distribution of 
the real data, as can be seen in Figf4] 

In order to understand the characteristics of the dy- 
namical process, and thus the behavior of the inter- 
arrival time distribution for experimental data, we study 
in detail different aspects of the dynamics on model net- 
works. If the walkers are sent one after the other the 
inter-arrival time does not depend only on the different 
paths taken by the walkers but also on their starting 
time. To understand the effect of this time delay on the 
process we also analyze the case where all the messages 
are sent at the same time. In this case inter-arrival times 
are uniquely determined by the complexity of the un- 
dertaken paths. The distribution of inter-arrival times 
rescaled by the average rate shows universal behavior, 
well fitted by a power law with the exponent close to 2 
(Fig [5]). We show results for only the networks which 
are not too sparse, since for sparse networks the distri- 
bution shows large statistical fluctuations. The power 
law behavior suggests that even if the walkers are com- 
pletely independent during their propagation, they arrive 
in bursts originating temporal clustering in the process. 
The probability for the shortest dt depends on the level 
of connectivity in the network, moreover the higher the 
average degree the smaller dt is with respect to the aver- 
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FIG. 5: The rescaled distributions of inter-arrival times in 
networks with Poisson degree distribution with different mean 
degrees. All the messages are sent at the same time and 
the distance between the sender and the receiver is equal to 
8. The networks have N = 10 4 nodes, the number of sent 
messages is M = 10 6 . 



FIG. 6: The distributions of hitting times in networks with 
Poisson degree distribution with mean degree equal to 4 and 
different distances between the sender and the receiver. The 
networks have N = f0 4 nodes, the number of sent messages 
is M = 10 6 . The messages are sent all at once. The fit 
is a stretched exponential function of the form y = 0.02 • 
exp(-0.02 • a; ' 96 ) . 



age rate. In general, we observe that the average degree 
controls the extension of the scaling regime: the more 
inter-connected is the network the wider is the variety of 
possible paths and therefore the range of observed dt. 

Since the outputs of random networks are distributed 
according to the Poisson degree distribution, a fraction 
of nodes in the network has k = outputs. Such nodes 
exist also in real networks. These nodes serve as a trap 
for the random walker. Similarly to the trapping prob- 
lem on networks [43|, the message reaching such a node 
cannot proceed any farther. In random networks this is a 
dominant mechanism for preventing messages from find- 
ing the target. A large number of messages gets lost and 
the inter-arrival times can become extremely large. It is 
highly unlikely that a message will be stopped because 
the length of its path has reached the limit given by TTL. 
It rather appears that on random networks messages ei- 
ther reach the target after a relatively short period of 
time, or they never reach it. We check this point by 
studying the distribution of hitting times. The hitting 
time, or the first passage time, is the time that a random 
walker takes to reach the target for the first time. In our 
model the distribution of hitting times is equivalent to 
the distribution of the lengths of the paths taken by the 
random walkers to reach the target node. In Fig|B]we see 
that only a small number of walkers takes the maximal 
number of 1000 steps, which is relatively small for the 
N = 10 4 networks. The most probable hitting time has 
a value equal to the distance between the sender and the 
receiver, meaning that many messages take the shortest 
path between the two nodes. Its probability is higher 
for smaller distances. The other possible paths are dis- 
tributed according to a stretched exponential distribution 
(FigJBJ) independent of the sender - receiver distance. 



IV. RANDOM NETWORKS WITHOUT DEAD 
ENDS 

In order to better understand the influence of nodes 
without outputs on the distribution of inter-arrival times, 
we study the same dynamical process on networks with 
slightly different topology. When we assign the number 
of ingoing or outgoing links to a node, we choose a ran- 
dom number between zero and N — 1 from the Poisson 
distribution, but we assign to the node this number plus 
one. In this way there are no nodes with zero ingoing or 
outgoing links and there are no dead ends in the network. 
The only trap in the network is now the target node. 

For the process where walkers are sent one by one we 
find that the distributions depend weekly on the average 
degree and the distance between the sender and the re- 
ceiver. Indeed, the average rate varies on a much smaller 
range (insets in Fig.d and |5J). Very good collapse is 
therefore observed rescaling the distributions by the av- 
erage rate (Fig.^7]and|8]). The universal function behaves 
as a stretched exponential and is therefore different than 
the one in Figj^l The main mechanism preventing the 
message from arriving to the target is now time exceed- 
ing the TTL limit. Many messages arrive to the target, 
which results in small inter-arrival times. In this case 
the walkers explore most of the paths existing between 
the two nodes. Similarly to previous cases, we show the 
results only for networks which are not too sparse, where 
the distributions show larger statistical fluctuations. 

Conversely, for the process where all messages are sent 
at once, the inter-arrival times are in the majority of 
cases either zero, i.e. two messages arrive to the target 
at the same time, or equal to one. This is due to the 
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FIG. 7: The distribution of inter-arrival times in networks 
with Poisson degree distribution without dead ends and with 
different values of the mean degree, rescaled by the average 
rate of message arrival. The networks have N — 10 4 nodes, 
the number of sent messages is M — 10 6 and the distance 
between the sender and the receiver is fixed equal to 5. The 
messages are sent one by one. The fit is a stretched exponen- 
tial function of the form y = 2.2 ■ exp(— f .5 ■ a; ' 8 ) . The insert 
shows the dependence of the mean inter-arrival time on the 
average degree of the network. 



FIG. 8: The distribution of inter-arrival times in net- 
works with Poisson degree distribution without dead ends and 
with different distances between the sender and the receiver, 
rescaled by the average rate of message arrivals. The mean 
of the Poissonian distribution is (fe) = 4. The networks have 
N = I0 4 nodes, the number of sent messages is M = 1 6 and 
the distance between the sender and the receiver is fixed equal 
to 5. The messages are sent one by one. The inset shows the 
dependence of the mean inter-arrival time from the distance 
between the sender and the receiver. 



large number of messages arriving to the target and to 
the ability of the walkers to explore well the space of all 
possible paths, with lengths ranging between the shortest 
path and the TTL. At each time step then at least one 
walker arrives to the target leading to an inter-arrival 
time equal to one. 

The distribution of hitting times when all messages 
are sent at once is also quite different than in the case 
where dead ends exist. In this case the walkers have the 
possibility to sample paths of all lengths and therefore 
the hitting time can assume values up to the threshold 
TTL. As we can see in FigJSl the distribution has an 
exponential behavior. Deviations from the exponential 
function can be seen only for small values of the hitting 
time, with this region getting smaller when the connec- 
tivity is larger. The coefficient of the exponential dis- 
tribution increases with (fc), namely the probability for 
longer hitting times is higher for larger average degrees. 
The walker takes more tortuous paths in a network with 
a higher level of connectivity. Conversely, the coefficient 
is independent of the distance between the sender and 
the receiver (inset FigJS]). 

The results obtained for random networks without 
dead ends confirm our conclusion that the behavior of 
the inter-arrival time distribution for random networks 
is a consequence of the existence of dead ends, the nodes 
without outputs which serve as traps for the messages. 
Since these traps exist in real networks, it is important 
to understand their influence on propagation of messages 
through random networks. 
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FIG. 9: The distributions of hitting times in networks with 
Poisson degree distribution without dead ends and for differ- 
ent (k). The distance between the sender and the receiver 
is equal to 5. The inset shows the same distribution for the 
average degree equal to 4 and different distances between the 
sender and the receiver. The networks have TV = 10 4 nodes, 
the number of sent messages is M = I0 6 . 



V. SCALE-FREE NETWORKS 

To explore further the influence of the network topol- 
ogy on the propagation of random walkers, we study this 
process on scale free networks. We find that in this case 
the characteristics of the process are much different. In 
order to measure the distribution we wait for the process 
to become stationary. Whereas in the case of Poisson 
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FIG. 10: The distribution of inter-arrival times in networks 
with power law degree distribution with different values of co- 
efficient 7 (P(k) oc fc -7 ). The networks have TV = 10 4 nodes, 
the number of sent messages is M — 10 6 and the distance 
between the sender and the receiver is fixed equal to 30. The 
messages are sent one by one. The inset shows the dependence 
of the mean inter-arrival time on 7. 



distributed links this happens very fast, for scale free 
networks the stationary regime is reached after a long 
transient. A large number of messages has to be sent to 
obtain good statistics. 

In Fig llOl we show the distribution of inter-arrival times 
for scale-free networks with different exponents of the 
degree distribution (P(k) oc fc~ 7 ) and in the case that 
messages are sent one by one. We observe that sparse 
networks (high 7) behave differently than well connected 
networks. In fact, the mean inter-arrival time rapidly 
increases for decreasing 7, suggesting that the walker 
takes very tortuous paths in a well connected network. In 
contrast to the previous cases, the distributions of inter- 
arrival times for scale free networks do not collapse onto 
a universal curve if the inter-arrival time is rescaled by 
the average rate of the process, even when the networks 
are not sparse. Therefore the average rate is not the only 
relevant quantity in the process. 

The number of messages arriving to the target is 
smaller than in Poisson networks without dead ends, and 
the inter-arrival times can be extremely long. In scale 
free networks there are by definition no dead ends, and 
the main mechanism for stopping the random walker is 
here time exceeding the TTL limit. The walkers take 
many long paths, probably looping through system and 
being able to visit different parts of the networks through 
shortcuts whose probability is higher due to hubs. In con- 
trast to Poisson networks without dead ends, although 
the walkers can explore well the space of possible paths, 
many of them never reach the target since they are ei- 
ther trapped in loops, or in regions typical for directed 
networks, where the walker can enter but cannot escape 
from. Moreover, the path to the target could be longer 
than the TTL limit, which is very probable on scale free 




FIG. 11: The distributions of inter-arrival times in networks 
with scale free distribution with different coefficients of the 
power law. All the messages are sent at the same time and 
the distance between the sender and the receiver is 30. The 
networks have TV = 10 4 nodes, the number of sent messages 
is M = 10 6 . 

networks. Since the number of walkers reaching the tar- 
get decreases, the probability of longer inter-arrival times 
increases. 

When messages are sent all at the same time, the distri- 
bution of inter-arrival times shows a power law behavior. 
However, in contrast to random networks with dead ends, 
where we find universal power law behavior, a change in 
the topology by tuning the coefficient 7 changes the slope 
from about 0.8, in well connected, to 2.2, in sparse net- 
works fFig fTT]) . As in the case of consecutive departures 
of messages, also here we do not observe the distribution 
collapse if inter-arrival time is rescaled by the average 
rate. 

The hitting times of the dynamical process on scale free 
networks also show a quite different behavior. In FigfT2l 
we see that networks with different power law coefficients 
have different distributions of hitting times. We also see 
that, similarly to the case of Poisson networks without 
dead ends, the walkers on scale free networks explore 
well the space of all possible paths from shortest path to 
TTL, especially in the case of well connected networks 
(smaller 7 exponents). By increasing 7, the number of 
walkers which take very long paths to reach the target 
decreases. 



VI. CONCLUSIONS 

In this paper we study the distribution of inter-arrival 
times of walkers sent through complex networks, with 
the goal to gain understanding on the process of infor- 
mation spreading inside the new generation of botnets 
based on peer to peer communication. The dynamics of 
sending botmaster orders inside the botnet influences the 
dynamics of arrival of messages sent from bots to the final 
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hitting time 

FIG. 12: The distributions of hitting times in networks with 
power law degree distributions with different coefficients 7. 
The distance between the sender and the receiver is fixed to 
30. The networks have N = 10 4 nodes, the number of sent 
messages is M = 10 6 . 

destination. This can be a computer of a simple user re- 
ceiving either SPAM or data packages contaminated with 
viruses. The dynamics of message propagation inside the 
botnets therefore affects the distribution of inter-arrival 
times of SPAM emails collected in the mailbox of a sin- 
gle user. We compare the results obtained by modeling 
botnets as random directed networks, where messages 
sent sequentially from the botmaster are random walk- 
ers, with the distribution of inter-arrival times of real 
SPAM data, presented in the Ref. [37|. The comparison 
shows that this simple model reproduces well the basic 
features of inter-arrival time distribution. 

To better understand the behavior of the distribution 
of the inter-arrival times we study the dynamical process 
of message propagation on different model networks and 
by two different procedures. We find that the main ingre- 
dients controlling the distribution of inter-arrival times 
are the distribution of possible path lengths between the 
sender and the receiver, and the number of messages not 
reaching the target. Possible paths between two nodes 
are determined only by the network topology while the 
mechanisms preventing messages from reaching the tar- 
get depend in addition on TTL. In the case of random 
networks, nodes without outputs represent natural traps 
for the messages, and most of the sent messages are pre- 
vented from reaching the target leading to long inter- 
arrival times. Since such nodes exist in real networks, this 
is the situation that we expect to observe in real peer to 
peer botnets. The distribution of inter-arrival times for 
the sequential sending of messages shows an almost con- 
stant regime for small inter-arrival times, followed by an 
exponential like cutoff including non-vanishing probabil- 
ity for very long inter-arrival times. When the messages 
are sent in parallel the inter-arrival times are power law 
distributed up to long inter-arrival times. 

We confirm that the nodes without outputs have a 



crucial role in processes on random networks by study- 
ing networks with the same Poisson distribution of links, 
but without dead ends. In this case inter-arrival times 
are much shorter and are distributed as a stretched ex- 
ponential for sequentially sent messages. When the mes- 
sages are sent in parallel only trivial values of inter-arrival 
times, zero or one, appear. Many messages arrive to the 
target, the space of possible paths is well explored and 
only those few messages exceeding the limit of maximal 
number of steps are prevented from reaching the target. 

Networks with a scale free distribution of links also 
show the important role of the network topology for the 
behavior of the inter-arrival time distribution. In these 
networks long inter-arrival times appear, but the mes- 
sages are prevented from reaching the target by different 
mechanisms. Here the limit on the maximal number of 
steps, together with the distribution of possible paths 
between the sender and the receiver, are determining 
the distributions of inter-arrival times. The inter-arrival 
times of sequentially sent messages can be very long in 
the case of well connected networks, or much shorter for 
the less connected networks (higher 7 values), but the 
behaviour of the distributions are in all cases different 
than for the other two network types. For messages sent 
in parallel this distribution is a power law, as in the case 
of random networks, but with the slope depending on the 
network's connectivity. From the three types of networks 
investigated, only for scale free networks the distributions 
do not collapse onto a universal curve if dt is rescaled by 
the average rate of the process. 

The change in topology has a different influence on the 
dynamical process on networks with Poisson and scale 
free distributed links. For scale free networks we change 
the topology by varying the 7 exponent of the link distri- 
bution. This change has a strong influence on the distri- 
bution of message inter-arrival times. The average rate 
is not the only relevant quantity for the process and the 
universality class of the distribution of inter-arrival times 
depends on the topology. Tuning the exponent 7 affects 
the number of hubs in the network, which in scale free 
networks play a crucial role in the process of message 
spreading. Their number changes qualitatively the be- 
havior of the inter-arrival time distribution. It influences 
the length of possible paths between nodes in the net- 
work not only by the change of local properties, such as 
the number of links of a node, but also by the creation of 
long range shortcuts through hubs, which increases the 
number of possible paths between the nodes. 

When the link distribution is Poissonian, changing the 
mean number of links per node modifies the network 
topology and therefore the mean rate of walker arrivals. 
The inter-arrival time distributions, however, collapse 
onto a universal scaling function if inter-arrival time is 
rescaled by the average rate. The distance between the 
sender and the receiver is also not affecting the universal- 
ity class of the universal feature of the distribution. The 
behavior of the distribution is also robust with respect to 
changes of other parameters, such as the time distance 
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between two sequential messages, the number of output 
links of the sender and input links of the receiver, or even 
the number of senders. The robustness of the behavior 
of the inter-arrival time distribution seems to be typical 
for networks with Poisson distributed links. Interestingly, 
the SPAM data, analyzed in terms of the junk mailbox, or 
by the geographical location of IP addresses of the sender, 
also collapse onto a universal function when inter-arrival 
time is rescaled by the average rate. This new approach 
in studying such processes using network theory can be 
employed in many fields. By applying this approach to 



directed networks we show that we can learn about bot- 
nets indirectly. The results are not botnet specific and 
can be applied to any other system which can be modeled 
by directed networks and through which the information 
propagates in random fashion. 
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